Middlesex University’s Invisque Visual Analytics Tool

Supported By

University of Leeds Corpus Linguistics Technique

VAST 2011 Challenge
Mini-Challenge 3 - Investigation into Terrorist Activity

Authors and Affiliations:

Sharmin (Tinni) Choudhury, Middlesex University, t.choudhury@mdx.ac.uk [PRIMARY contact]
Chris Rooney,
Middlesex University, c.rooney@mdx.ac.uk
Eric Atwell,
University of Leeds, scscb@leeds.ac.uk
Claire Brierley,
University of Leeds, e.s.atwell@leeds.ac.uk
Kai Xu,
Middlesex University, k.xu@mdx.ac.uk
Raymond Chen,
Middlesex University, r.chen@mdx.ac.uk
William Wong,
Middlesex University, w.wong@mdx.ac.uk

Tool(s):

The primary tool used was Middlesex University’s INteractive VIsual Search and QUery Environment (INVISQUE). The Invisque user interface (UI), which is written in Adobe Flash, is supported by a middleware written in Java that queried the MC3 dataset as stored in a MySQL database.

 

In addition, University of Leeds used a python based implementation of their corpus analysis algorithm to generate log likelihood statistics for words in the MC3 news article corpus that was also stored in the MySQL database for ease of access.

 

A simple python script was used to transfer the MC3 data and the University of Leeds analysis results, which were also in simple text files, into the MySQL database.

Video:

Invisque Video

ANSWERS:


MC 3.1 Potential Threats: Identify any imminent terrorist threats in the Vastopolis metropolitan area. Provide detailed information on the threat or threats (e.g. who, what, where, when, and how) so that officials can conduct counterintelligence activities. Also, provide a list of the evidential documents supporting your answer.

Finding from News Corpus

There were many interesting activities in Vastopolis but most could not be considered “imminent terrorist” threats. However, what may pose an imminent threat involves stolen equipment from the labs of molecular biologist Professor Edward Patino. Prof. Patino has been harassed by the group Citizens for Ethical Treatment of Lab Mice, who in-turn are affiliated with the Forever Brotherhood of Antarctica. The Professor himself has recently given lectures on the threat of bioterrorism, in addition, the Center for Disease Control (CDC) also released a recent report highlighting the dangers of bioterrorism. Since the robbery of the professor’s lab, the Brotherhood and the Citizens for Ethical Treatment of Lab Mice have shown an increase level of activity. Lastly, dead fish has turned-up in Vast River. Therefore, we believe that there may be an imminent threat to Vastopolis metropolitan area from Forever Brotherhood of Antarctica and their affiliates, the Citizens for Ethical Threatment of Lab Mice involving some form of biological weapon created from the equipment stolen from Professor Patino’s lab.

Table 1: Timeline of News Articles

Date of article

Event

11-04-2011

Prof. Patino gives lecture on bioterrorism

18-04-2011

CDC releases publication on threats of bioterrorism

26-04-2011

Prof Patino’s lab gets robbed

02-05-2011

Mayor’s dog gets kidnapped

03-05-2011

Basketball teams mascot goes missing from Vastopolis Dome

09-05-2011

Citizens for Ethical Threatment of Lab Mice send threatening emails to Vast Press

19-05-2011

Dead fish is found is Vast River

The other events in Vastopolis, which were discounted as either being resolved or self-contained, include,

1.     Military weapons went missing from Vastopolis Armed Forces on the 26-04-2011 and on the 30-04-2011, military grade weapons were used in a park shootout in Southville. However, the weapons were recovered at the Vastopolis airport on the 20-05-2011.

2.     Two mental patients affiliated with the psychobrotherhood escaped the Vastopolis Center for the Criminally Insane on 27-04-2011 but were caught on the 12-05-2011 while trying to make a bomb. No further information was available in the corpus for psychobrotherhood.

3.     An Antarctica Airlines plane crashed and traces of explosives were found in the wreckage but this is a past event. In addition, while there were articles about bad security at Vastopolis Airport, following the crash – security was increased.

4.     A 60 year old man built an improvised explosive device to kill his neighbor’s cat KeeKee but that was a self-contain incident.

5.     A man with a bomb concealed in the turkey was stopped at Vastopolis Airport but that news article did not provide any hooks for further investigation.

6.     The daughter of a military counter-intelligence agent was raped by another soldier and her identity exposed but the article provided no course to follow.

7.     F-Alliance a group of Hackers comprised of high-school drop-out were arrested, thus another resolved issue.

8.     Anarchists for Freedom issue daily threats to Vastopolis Officials but there is no evidence they actually do more then bark.

9.     Lastly, Vastopolis was included in general threat issued by the overseas terror group Network of Dread.

Analytics Process

We used the INteractive VIsual Search and QUery Environment (INVISQUE), a prototype visual analytics interface created at Middlesex University, to visually sift through the news corpus. INVISQUE uses index-card visualization to represent individual information items, in this case the news articles, and arranges them on screen on an X-Y axis. Figure 1 shows the search results from the keyword search “bomb” arranged on the X axis by significance and on the Y axis by date - so that news articles with higher level of significance for the keyword “bomb” appears more to the left and newer articles appear higher up the Y axis.

 


Figure 1: INVISQUE index-card visualization arranged on X-Y axis

The “significance” value was calculated by collaboration members from the University of Leeds who performed keyword extraction on the news corpus. Keyword Extraction is a standard Corpus Linguistics technique for genre classification which pinpoints statistically significant or "key" words for that genre via comparison with a general reference corpus. The significance calculation for the MC 3 corpus entailed comparison of word frequency distributions in each of the 4474 news article test sets with their distribution in the entire news article dataset as reference corpus. The Leeds program verifies apparent overuse of lexical items in each article by computing the difference between these observed frequencies and the norm as represented by their expected frequency in the whole dataset, expressed as a log likelihood (LL) statistic. Words with LL scores of 6.63 or above are statistically significant.

 

Single word searches, e.g. bioterrorism, on the INVISQUE interface is applied against the word list generated by University of Leeds, see Figure 2, and leads to generation of index-cards that have the matching keyword as the title and the significance of the keyword as the top-left value. Composite phrases, e.g. “Vast River”, are applied against the full article text and in the latter case – the title of the index card becomes the most significant keyword, as calculated by Leeds, in the article. These are illustrated in Figure 3.

 


Figure 2: Table of Words in MySQL containing results of University of Leeds Keyword Extraction

 


Figure 3: Searching Using INVISQUE

As shown in Figure 4, the cards also show a “gist” of the article by displaying the top three most significant keywords of the article, the article title, Vastopolis locations mentioned in the article, which are extracted and appended to the cards by the middleware based on a pre-compiled list, and the date of the article. The cluster of returned cards can be filtered by any of the card features and the cards also have a shortcut to the full text of the article.


Figure 4: Index Card Fields

As demonstrated in the accompanying video, the primary technique used to explore the corpus provided was visual searching and filtering. This technique allowed us to explore the corpus very thoroughly, very quickly and we began to get a good picture of the happening in Vastopolis within hours of beginning our exploration.